{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Creating reference CSVs for model training and inference\n",
    "\n",
    "When you train models with `solaris`, it uses reference CSV files to find images and matching labels. Let's go through what those are and what they should include. You'll create (up to) three different reference files:\n",
    "\n",
    "- [Training data](#Training-Data-CSV): Required for Training\n",
    "- [Epoch-wise validation data](#Validation-Data-CSV): Optional\n",
    "- [Inference data](#Inference-Data-CSV): Required for inference\n",
    "- [Using these files](#Using-these-files)\n",
    "\n",
    "## Training Data CSV\n",
    "\n",
    "Your training data CSV must have two columns with the __exact__ names below:\n",
    "\n",
    "- __image__: The `image` column defines the paths to each image file to be used during training, one path per row. You can use either the absolute path to the file or the path relative to the path that you run code in - we recommend using the absolute path for consistency.\n",
    "- __label__: The `label` column defines the paths to the label (mask) files. If you need to create masks first, [check out the Python API tutorial](api_masks_tutorial.ipynb) or the [CLI tutorial](../cli_mask_creation.html).\n",
    "\n",
    "__The image and label in each row must match!__ This is how `solaris` matches your training images to the expected outputs.\n",
    "\n",
    "If you choose to have `solaris` split validation data out for you, it will randomly select a fraction of the rows for validation. The fraction used for validation is defined in the config YAML file - for more on how to do so, [see the YAML config reference](creating_the_yaml_config_file.ipynb).\n",
    "\n",
    "For more control over what data is used for training vs. validation, you can create a separate validation CSV.\n",
    "\n",
    "## Validation Data CSV\n",
    "\n",
    "This CSV is the same as the Training Data CSV, but specifies images and masks to be used for epoch-wise validation. Make sure there's no overlap between your training and validation sets - you don't want any data leaks! If you want `solaris` to split the validation data out of the training data automatically, you don't need to provide this.\n",
    "\n",
    "## Inference Data CSV\n",
    "\n",
    "This reference file points to the image files that you wish to make predictions on. It therefore only needs to contain one column: __image__.\n",
    "\n",
    "## Using these files\n",
    "\n",
    "Once you have made these labels, provide the paths to them [in your configuration file](creating_the_yaml_config_file.ipynb); they'll automatically be loaded into your config when you call [solaris.utils.config.parse()](../../api/utils.rst#solaris.utils.config.parse)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "solaris",
   "language": "python",
   "name": "solaris"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
